Histogram Sort with Sampling
نویسندگان
چکیده
To minimize data movement, state-of-the-art parallel sorting algorithms use sampling and histogramming techniques to partition keys prior to redistribution. Samples enable partitioning to be done using representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partitioning. We introduce Histogram sort with sampling (HSS), which combines sampling and histogramming techniques to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known algorithm for finding this partitioning, our algorithm requires a factor of Θ(log(p)/log log(p)) less communication than the best known (recently introduced) alternative, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm and compare its performance to two existing implementations, and provide a brief application study showing the benefit of the new algorithm.
منابع مشابه
High-speed parallel external sorting of data with arbitrary distribution
Many parallel sorting algorithms of (external) disk data have been reported such as NOWsort, SPsort, and hill sort, etc. They all reduce the execution time compared to some known sequential sort; however, they differ in terms of the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. If we divide and redistribute data to pro...
متن کاملDistribution-Insensitive Parallel External Sorting on PC Clusters
There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and costeffectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data w...
متن کاملSorting On A Graphics Processing Unit(GPU)
2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles ...
متن کاملNode Histogram vs. Edge Histogram: A Comparison of PMBGAs in Permutation Domains
Previous papers have proposed an algorithm called the edge histogram sampling algorithm (EHBSA) that models the relative relation between two nodes (edge) of permutation strings of a population within the PMBGA framework for permutation domains. This paper proposes another histogram based model we call the node histogram sampling algorithm (NHBSA). The NHBSA models node frequencies at each abso...
متن کاملMP-sort: Sorting at Scale on Blue Waters – for a Cosmological Simulation
We implement and investigate a parallel sorting algorithm (MP-sort) on Blue Waters. MP-sort sorts distributed array items with non-unique integer keys into a new distributed array. The sorting algorithm belongs to the family of partition sorting algorithms: the target storage space of a parallel computing rank is represented by a histogram bin whose edges are determined by partitioning the inpu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018